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Abstract — We propose a novel algorithm for compressive 
imaging that exploits both the sparsity and persistence across 
scales found in the 2D wavelet transform coefficients of natural 
images. Like other recent works, we model wavelet structure 
using a hidden Markov tree (HMT) but, unlike other works, ours 
is based on loopy belief propagation (LBP). For LBP, we adopt a 
recently proposed "turbo" message passing schedule that alter- 
nates between exploitation of HMT structure and exploitation of 
compressive-measurement structure. For the latter, we leverage 
Donoho, Maleki, and Montanari's recently proposed approximate 
message passing (AMP) algorithm. Experiments with a large im- 
age database suggest that, relative to existing schemes, our turbo 
LBP approach yields state-of-the-art reconstruction performance 
with substantial reduction in complexity. 

I. Introduction 

In compressive imaging [1], we aim to estimate an image 
x 6 M. N from M < N noisy linear observations y € IR M , 

y = &x + w = ^^9 + w, (1) 

assuming that the image has a representation 9 £ WL N in some 
wavelet basis (i.e., x = *&9) containing only a few (K) 
large coefficients (i.e., K <C N). In (1), $ e R^ IxN is a 
known measurement matrix and w ~ 7V(0,<7 2 J) is additive 
white Gaussian noise. Though M < N makes the problem 
ill-posed, it has been shown that x can be recovered from 
y when K is adequately small and 3? is incoherent with \P 
[1]. The wavelet coefficients of natural images are known to 
have an additional structure known as persistence across scales 
(PAS) [2], which we now describe. For 2D images, the wavelet 
■ coefficients are naturally organized into quad-trees, where each 
coefficient at level j acts as a parent for four child coefficients 
at level j + 1. The PAS property says that, if a parent is very 
small, then all of its children are likely to be very small; 
similarly, if a parent is large, then it is likely that some (but 
not necessarily all) of its children will also be large. 

Several authors have exploited the PAS property for com- 
pressive imaging [3]-[6]. The so-called "model-based" ap- 
proach [3] is a deterministic incarnation of PAS that leverages 
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a restricted union-of-subspaces and manifests as a modified 
CoSaMP [7] algorithm. Most approaches are Bayesian in 
nature, exploiting the fact that PAS is readily modeled by 
a hidden Markov tree (HMT) [8]. The first work in this 
direction appears to be [4], where an iteratively re- weighted 
l\ algorithm, generating an estimate of x, was alternated with 
a Viterbi algorithm, generating an estimate of the HMT states. 
More recently, HMT-based compressive imaging has been 
attacked using modern Bayesian tools [9]. For example, [5] 
used Markov-chain Monte-Carlo (MCMC), which is known 
to yield correct posteriors after convergence. For practical 
image sizes, however, convergence takes an impractically long 
time, and so MCMC must be terminated early, at which 
point its performance may suffer. Variational Bayes (VB) 
can sometimes offer a better performance/complexity tradeoff, 
motivating the approach in [6]. Our experiments indicate 
that, while [6] indeed offers a good performance/complexity 
tradeoff, it is possible to do significantly better. 

In this paper, we propose a novel approach to HMT-based 
compressive imaging based on loopy belief propagation [10]. 
For this, we model the coefficients in 9 as conditionally 
Gaussian with variances that depend on the values of HMT 
states, and we propagate beliefs (about both coefficients and 
states) on the corresponding factor graph. A recently proposed 
"turbo" messaging schedule [11] suggests to iterate between 
exploitation of HMT structure and exploitation of observa- 
tion structure from (1). For the former we use the standard 
sum-product algorithm [12], [13], and for the latter we use 
the recently proposed approximate message passing (AMP) 
approach [14]. The remarkable properties of AMP are 1) a 
rigorous analysis (as M, N — s- oo with M/N fixed, under i.i.d 
Gaussian <&) [15] establishing that its solutions are governed 
by a state-evolution whose fixed points — when unique — yield 
the true posterior means, and 2) very low implementational 
complexity (e.g., AMP requires one forward and one inverse 
fast- wavelet- transform per iteration, and very few iterations). 

We consider two types of conditional-Gaussian coefficient 
models: a Bernoulli-Gaussian (BG) model and a two-state 
Gaussian-mixture (GM) model. The BG model assumes that 
the coefficients are either generated from a large-variance 
Gaussian distribution or are exactly zero (i.e., the coefficients 
are exactly sparse), whereas the GM model assumes that the 
coefficients are generated from either a large-variance or a 
small-variance Gaussian distribution. Both models have been 
previously applied for imaging, e.g., the BG model was used 
in [5], [6], whereas the GM model was used in [4], [8]. 

Although our models for the coefficients 9 and the cor- 
responding HMT states involve statistical parameters like 
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Fig. 1. Left: The camera-man image. Center: The corresponding transform 
coefficients, demonstrating PAS. Right: An illustration of quad-tree structure. 



variance and transition probability, we learn those parameters 
directly from the data. To do so, we take a hierarchical 
Bayesian approach — similar to [5], [6] — where these statis- 
tical parameters are treated as random variables with suitable 
hyperpriors. Experiments on a large image database show that 
our turbo-AMP approach yields state-of-the-art reconstruction 
performance with substantial reduction in complexity. 

The remainder of the paper is organized as follows. Sec- 
tion II describes the signal model, Section III describes the 
proposed algorithm, Section IV gives numerical results and 
comparisons with other algorithms, and Section V concludes. 

Notation: Above and in the sequel, we use lowercase bold- 
face quantities to denote vectors, uppercase boldface quantities 
to denote matrices, J to denote the identity matrix, (-) T to 
denote transpose, and ||a;||2 — V x T x. We use Pe|s(0| s ) 
to denote the probability density 1 function (pdf) of random 
variable given the event S = s, where often the subscript 
"e|s" i s omitted when there is no danger of confusion. We 
use Af(x;m, S) to denote the A^-dimensional Gaussian pdf 
with argument x, mean m, and covariance matrix S, and we 
write x ~ Af(m, X) to indicate that random vector x has 
this pdf. We use E{-} to denote expectation, Pr{£} to denote 
the probability of event £, and £(•) to denote the Dirac delta. 
Finally, we use oc to denote equality up to a multiplicative 
constant. 

II. Signal Model 

Throughout, we assume that & represents a 2D wavelet 
transform [2], so that the transform coefficients 8 = 
[6\, ... , 6>jv] t can be partitioned into so-called "wavelet" coef- 
ficients (at indices n G W) and "approximation" coefficients 
(at indices n G .4). The wavelet coefficients can be further 
partitioned into several quad-trees, each with J > 1 levels 
(see Fig. 1). We denote the indices of all coefficients at level 
j G {0, . . . , J— 1} of these wavelet trees by VVy, where j = 
refers to the root. In the interest of brevity, and with a slight 
abuse of notation, we refer to the approximation coefficients 
as level "—1" of the wavelet tree (i.e., A = W-i). 

As discussed earlier, two coefficient models are considered 
in this paper: Bernoulli-Gaussian (BG) and two-state Gaussian 
mixture (GM). For ease of exposition, we focus on the BG 
model until Section III-E, at which point the GM case is 
detailed. In the BG model, each transform coefficient 9 n is 
modeled using the (conditionally independent) prior pdf 

p(6n | «„) = SnM{e n ; 0, a 2 ) + (1 - s n )S(6 n ), (2) 

1 or the probability mass function (pmf), as will be clear from the context. 
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Fig. 2. Factor graph representation of the signal model. The variables s i and 
S(j are wavelet states at the roots of two different Markov trees. The variable 
s§ is an approximation state and hence is not part of any Markov tree. The 
remaining s„ are wavelet states at levels j > 0. For visual simplicity, a binary- 
tree is shown instead of a quad-tree, and the nodes representing the statistical 
parameters p, 7T^ 1; 7Tq , {p 3 ■, 71-j 1 , i"^ }, as well as those representing their 
hyperpriors, are not shown. 



where s n G {0, 1} is a hidden binary state. The approximation 
states {s Il }„ e w_ 1 are assigned the apriori activity rate -k 1 _ x = 
Pr{s„ = l\n G WLi}, which is discussed further below. 
Meanwhile, the root wavelet states {sn}riew are assigned 
7Tq = Pr{s„ = 1 1 n G Wo}- Within each quad-tree, the states 
have a Markov structure. In particular, the activity of a state 
at level j + 1 is determined by its parent's activity (at level j) 



and the transition probabilities {it 
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}, where 7n denotes 



the probability that the child's state equals given that his 
parent's state also equals 0, and tt^ 1 denotes the probability 
that the child's state equals 1 given given that his parent's 
state also equals 1. The corresponding factor graph is shown 
in Fig. 2. 

We take a hierarchical Bayesian approach, modeling the 

statistical parameters cr 2 , {c^j^Li, ti"1i, ttoj i 7 '") 1 ' 7r j°}/=o as 
random variables and assigning them appropriate hyperpriors. 
Rather than working directly with variances, we find it more 
convenient to work with precisions (i.e., inverse-variances) 
such as p = a~ 2 . We then assume that all coefficients at 
the same level have the same precision, so that pj = er~ 2 for 
all n G Wj. To these precisions, we assign conjugate priors 
[16], which in this case take the form 



p ~ Gamma(a, b) 
Pj ~ Gamma(aj, 



(3) 
(4) 



where Gamma(p; a, 6) = j^b a p a 1 exp(— bp) for p > 0, 
and where a, b, {aj, bj}jz^ l are hyperparameters. (Recall that 
the mean and variance of Gamma(o, b) are given by a/6 and 
a/6 2 , respectively [16].) For the activity rates and transition 
parameters, we assume 
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Bcta(c, d) 
Bcta(c, d) 
Bcta(cj, dj) 
Beta(c^, dj), 



(5) 
(6) 
(7) 
(8) 
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where Bcta(p; c, d) = r(c)rtd) P C ^ ~ P) d 1 < an d where 
c,d,c,d,{cj,dj,Cj,dj}j~l are hyperparameters. (Recall that 
the mean and variance of Bcta(c, d) are given by and 
(c+aWe+d+I) ' respectively [16].) Our hyperparameter choices 
are detailed in Section IV. 

III. Image Reconstruction 

To infer the wavelet coefficients 9, we would ideally like 
to compute the posterior pdf 

p(d\y) cc J2p(y\e,s)p(9,s) (9) 

s N M 

= ji p( g ) n p(8n\s n ) n^i fl )'( 10 ) 

s s ^v^ / n —\ v m— 1^ v 

= h(s) = f n (6 n ,s n ) = g m {0) 

where oc denotes equality up to a multiplicative constant. 
For the BG coefficient model, /„(#„, s„) is specified by 
(2). Due to the white Gaussian noise model (1), we have 
gm(9) = A/"(y m ; cl^O, a 2 ), where denotes the m th row 
of the matrix A = 

A. Loopy Belief Propagation 

While exact computation of p(9 \ y) is computationally pro- 
hibitive, the marginal posteriors {p(9 n \ y)} can be efficiently 
approximated using loopy belief propagation (LBP) [10] on 
the factor graph of Fig. 2, which uses round nodes to denote 
variables and square nodes to denote the factors in (10). In do- 
ing so, we also obtain the marginal posteriors {p(s n \ y)}. For 
now, we treat statistical parameters p, ir_x, ttq, {pj, 7rj , Tr" }, 
as if they were fixed and known, and we detail the procedure 
by which they are learned in Section III-D. 

In LBR messages are exchanged between the nodes of the 
factor graph until they converge. Messages take the form of 
pdfs (or pmfs), and the message flowing to/from a variable 
node can be interpreted as a local belief about that variable. 
According to the sum-product algorithm [12], [13] the mes- 
sage emitted by a variable node along a given edge is (an 
appropriate scaling of) the product of the incoming messages 
on all other edges. Meanwhile, the message emitted by a 
function node along a given edge is (an appropriate scaling of) 
the integral (or sum) of the product of the node's constraint 
function and the incoming messages on all other edges, where 
the integration (or summation) is performed over all variables 
other than the one directly connected to the edge along which 
the message travels. When the factor graph has no loops, 
exact marginal posteriors result from two (i.e., forward and 
backward) passes of the sum-product algorithm [12], [13]. 
When the factor graph has loops, however, exact inference 
is known to be NP hard [17] and so LBP is not guaranteed to 
produce correct posteriors. Still, LBP has shown state-of-the- 
art performance in many applications, such as inference on 
Markov random fields [18], turbo decoding [19], LDPC de- 
coding [20], multiuser detection [21], and compressive sensing 
[14], [15], [22], [23]. 




Fig. 3. The turbo approach yields a decoupled factor graph. 

B. Message Scheduling: The Turbo Approach 

With loopy belief propagation, there exists some freedom 
in how messages are scheduled. In this work, we adopt the 
"turbo" approach recently proposed in [11]. For this, we split 
the factor graph in Fig. 2 along the dashed line and obtain the 
two decoupled subgraphs in Fig. 3. We then alternate between 
belief propagation on each of these two subgraphs, treating 
the likelihoods on {.s n } generated from belief propagation on 
one subgraph as priors for subsequent belief propagation on 
the other subgraph. We now give a more precise description of 
this turbo scheme, referring to one full round of alternation as 
a "turbo iteration." In the sequel, we use v^^gi-) to denote 
the message passed from node A to node B during the t th 
turbo iteration. 

The procedure starts at t = 1 by setting the "prior" 
pmfs {^'(O} in accordance with the apriori activity rates 
Pr{s„ = 1} described in Section II. LBP is then iterated (to 
convergence) on the left subgraph in Fig. 3, finally yielding the 
messages {^"^ Sii (•)}■ We note that the message vf^ Sn (s n ) 
can be interpreted as the current estimate of the likelihood 2 
on s n , i.e., p(y \ s n ) as a function of s n . These likelihoods 
are then treated as priors for belief propagation on the right 
subgraph, as facilitated by the assignment d$(.) = vy (.) 
for each n. Due to the tree structure of HMT, there are no loops 
in right subgraph (i.e., inside the "h" super-node in Fig. 3), and 
thus it suffices to perform only one forward-backward pass of 
the sum-product algorithm [12], [13]. The resulting leftward 
messages v%T* (.) are subsequently treated as priors for belief 
propagation on the left subgraph at the next turbo iteration, as 
facilitated by the assignment hn (•) = v \^\ s (•)■ The process 
then continues for turbo iterations t = 2,3,4,..., until the 
likelihoods converge or a maximum number of turbo iterations 
has elapsed. Formally, the turbo schedule is summarized by 

d^(s n ) = uf^ Sn (s n ) (11) 

In the sequel, we refer to inference of {s n } using 
compressive-measurement structure (i.e., inference on the left 
subgraph of Fig. 3) as soft support-recovery (SSR) and in- 
ference of {s n } using HMT structure (i.e., inference on the 

2 In turbo decoding parlance, the likelihood v f > _ ¥s (s„) would be referred 
to as the "extrinsic" information about s„ proSuced by the left "decoder", 
since it does not directly involve the corresponding prior h^(s n ). Similarly, 
the message ^l_. s (sn) would be referred to as the extrinsic information 
about s n produced %y the right decoder. 
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right subgraph of Fig. 3) as soft support-decoding (SSD). SSR 
details are described in the next subsection. 



C. Soft Support-Recovery via AMP 

We now discuss our implementation of SSR during a 
single turbo iteration t. Because the operations are invariant 
to t, we suppress the t-notation. As described above, SSR 
performs several iterations of loopy belief propagation per 
turbo iteration using the fixed priors A„ = h n (s n = 1). 
This implies that, over SSR's LBP iterations, the message 

v fn->9n(-) * s fi xe d at 

v fn ^e n (6 n ) = X n M(0 n ;0,a 2 n ) + (1 - \ n )5(6 n ). (13) 

The dashed box in Fig. 3 shows the region of the factor 
graph on which messages are updated during SSR's LBP 
iterations. This subgraph can be recognized as the one that 
Donoho, Maleki, and Montanari used to derive their so-called 
approximate message passing (AMP) algorithm [14]. While 
[14] assumed an i.i.d Laplacian prior for 0, the approach for 
generic i.i.d priors was outlined in [23]. Below, we extend 
the approach of [23] to independent new-identical priors (as 
analyzed in [24]) and we detail the Bernoulli-Gaussian case. 
In the sequel, we use a superscript-i to index SSR's LBP 
iterations. 

According to the sum-product algorithm, the fact that 
i/f n ^yg n (.) is non-Gaussian implies that v\ (.) is also 
non-Gaussian, which complicates the exact calculation of 
the subsequent messages v % (.) as defined by the sum- 
product algorithm. However, for large N, the combined effect 
of {v l e ^ g (.)}^ =1 at the g m nodes can be approximated as 
Gaussian using central-limit theorem (CLT) arguments, after 
which it becomes sufficient to parameterize each message 



(.) by only its mean and variance: 



Ie n BnV l n ^>gS Bn ) 
L (On - MmJ 2 v \„ 



.(*»)■ 



(14) 
(15) 



Assuming that the values Af n satisfy 



M 



(22) 



1=1 



which occurs, e.g., when M is large and {A; n } are generated 
i.i.d with variance l/M, we have c\ n c\ = i Em=i c mn. 
and thus (20) is well approximated by 

<^ 9m (0«) « (\ n Af(e n ;0,al) + (1 - X n )5(9 n )) 

xAf(e n ;C nm ,c n ) (23) 

Cnm = ~}ll^Lm-^ lnZ ln- (24) 

In this case, the mean and variance of ^l + ^ g (•) become 



Mnm Q; n( c n)£n?Ti/(l Tnm) 



„i+l 



i+l\2 



u t+L c l IP 

r"nm ^"nl Sri 



where 



Inm = ^(4)eX P (-C„(c;)(a m ) 2 ), 



(25) 
(26) 
(27) 



p n {c) 4 (-l + l/A n )Vl + ^/c 
C„(c) 4 (2c(l + c/al))- 1 . 

According to the sum-product algorithm, p l+1 (9 n \y), the 
posterior on 9 n after SSR's z th -LBP iteration, obeys 



p l+1 (e n \y) ex u fn 



l=l v ~ 



0, (28) 



1=1 " gi ^e n \"n) 

whose mean and variance determine the i th -iteration MMSE 
estimate of 9 n and its variance, respectively. Noting that the 
difference between (28) and (20) is only the inclusion of the 
m th product term, these MMSE quantities become 



Mn +1 = Q «( c n)£ri/(1 + 7n) (29) 
< +1 = lUfC 1 ? + ^VCn (30) 
Cn = EfllAlnZi n (31) 

7* 4 /3„(4)exp(-Cn(cj l )(eJ 2 )- (32) 
Similarly, the posterior on s n after the i th iteration obeys 



Combining 



f +1 (s n \y) oc v l fn ^ Sn (s n )v hn ^ Sn {s n ), (33) 
/ r h /„ i \ where ^ a/ 

Jj^{0]fi q ,v q ) ex AA U; ^7; g (16) "/„-►.,. (*n) « / /„(^,s„)n<^„(^)- (34) 



i=i 



with g m (0) = Af(y m ; a^ n 9, a 2 ), the CLT then implies that 



Since f„(0„, s n ) = sj\f(6 n ;0, a 2 ) + (1 - s n )8($ n ), it can be 
seen that the corresponding log-likelihood ratio (LLR) is 



mn ran 



A ' A 2 



J A 



Urn Eg/n ^-mqAtg 
,i A „2 i A 2 „,i 



(17) 
(18) 



= ln 



1 - At 



(35) 



° 2 + E q *n< q V qm - ^ 



The updates [i % nXk an d u mn can tnen t> e calculated from 

«4£ flra (*») OC I/, B _tf„ (0 n ) «£_> fl „ (<U (20) 

where, using (16), the product term in (20) is 



ocAM 0„! 



(21) 



Clearly, the LLR L 1 ^ 1 and the likelihood function _> 8n (0 
express the same information, but in different ways. 

The procedure described thus far updates 0(MN) variables 
per LBP iteration, which is impractical since MN can be very 
large. In [23], Donoho, Maleki, and Montanari proposed, for 
the i.i.d case, further approximations that yield a "first-order" 
approximate message passing (AMP) algorithm that allows 
the update of only O(N) variables per LBP iteration, essen- 
tially by approximating the differences among the outgoing 
means/variances of the g m {-) nodes (i.e., z l mn and c l mn ) as 
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well as the differences among the outgoing means/variances 
of the 9 n nodes (i.e., p, % n and v^J. These resulting algorithm 
was then rigorously analyzed by Bayati and Montanari in [15]. 
We now summarize a straightforward extension of the i.i.d 
AMP algorithm from [23] to the case of an independent but 
non-identical Bernoulli-Gaussian prior (13): 



V M A 



mn m 

< +1 = G n (C n ;c*) 



(36) 
(37) 
(38) 



Vra - En=l Ann/4, + & En=! C*) (39) 



" ' M L~m = l "n ' 

where F n (.; .), G„(.; .) and F' n {.; .) are defined as 

a n (c) 



K&c) = 

Tn(C;c) = 



1+T„(fcc) € 

T„(e;c)F„(^;c) 2 + c rX(^;c) 
Qn(c)[l + r 7t (e;c)(l + 2C(c»)g 2 )] 

[l + r„(C;c)] 2 
/3„(c) exp(-Cn(c)e 2 )- 



(41) 
(42) 
(43) 
(44) 



For the first turbo iteration (i.e., t 



1), we initialize 

1 „2 



AMP using z 1 - 1 = y m , p}- 1 = 0, and c l ~ L > a n for 
all m, n. For subsequent turbo iterations (i.e., t > 1), we 



initialize AMP by setting z\ 



=1 equal to the final 



values of z % m , p l n , c l n generated by AMP at the previous turbo 
iteration. We terminate the AMP iterations as soon as either 
— M* _1 ||2 < 10~ 5 or a maximum of 10 AMP iterations 
have elapsed. Similarly, we terminate the turbo iterations as 
soon as either || /x 1 *- 1 — M (t_1) II 2 < 10~ 5 a maximum of 10 turbo 



iterations have elapsed. The final value of fi = [/ii, 
is output at the signal estimate 6. 



D. Learning the Statistical Parameters 

We now describe how the precisions {pj} are learned. First, 
we recall that pj describes the apriori precision on the active 
coefficients at the j th level, i.e., on {0 n } n es,, where the 
corresponding index set Sj = {n 6 Wj : s n = 1} is of 
size Kj = Furthermore, we recall that the prior on pj 
was chosen as in (4). Thus, ;/ we had access to the true values 
{6 n }neSj, then (2) implies that 



I tie 5,-) = Af(6 n ;0, Pj ), 



(45) 



of Gamma(dj , bj ) where Qj 



\K 3 and bj = b 3 



\ EnG5 ^n* ^ n P ract i ce - we don't have access to the true 
values {6 n }n£s. nor to the set Sj, and thus we propose to 
build surrogates from the SSR outputs. In particular, to update 
Pj after the t th turbo iteration, we employ 



4 {n G 



L<<> > 0} 



iff 



|5f| 



(46) 
(47) 



and {^t ( ll f) } jigS ( t ) , where L\\ ] and /i^ denote the final LLR on 

s n and the final MMSE estimate of 8 n , respectively, at the t th 
turbo iteration. These choices imply the hyperparameters 



~ (t+i) 

a) = aj 



1 K M 
11 



= bj + kZ 



(48) 
(49) 



Finally, to perform SSR at turbo iteration t + 1, we set the 
variances {cr n } n eWj equal to the inverse of the expected 
precisions, i.e., l/E{Oj} = ¥* +1 '/af . The noise variance 
(40) c is learned similarly from the SSR-estimated residual. 

Next, we describe how the transition probabilities {""j 1 } are 
learned. First, we recall that ttj 1 describes the probability that 
a child at level j+1 is active (i.e., s n = 1) given that his parent 
(at level j) is active. Furthermore, we recall that the prior on 



7rj was chosen as in (7). Thus // we knew that there were Kj 
active coefficients at level j, of which Cj had active children, 
then 4 the posterior on 7rj 1 would take the form of Bcta(cj , dj), 
where Cj = Cj + Cj and dj = dj + Kj — Cj. In practice, we 
don't have access to the true values of Kj and Cj, and thus we 
build surrogates from the SSR outputs. In particular, to update 
71-j 1 after the t th turbo iteration, we approximate s n = 1 by 
the event > 0, and based on this approximation set K ( p 
(as in (47)) and Cj . The corresponding hyperparameters are 
then updated as 



= d i 



Cf 

■Kf-Cf. 



(50) 
(51) 



Finally, to perform SSR at turbo iteration t + 1, we set 
the transition probabilities 71-j 1 equal to the expected value 
c< t+1) /(cj t+1) + The parameters Tr^, 7^, and {tt™} 

are learned similarly. 



E. The Two-State Gaussian-Mixture Model 

Until now, we have focused on the Bernoulli-Gaussian (BG) 
signal model (2). In this section, we describe the modifications 
needed to handle the Gaussian mixture (GM) model 



which implies 3 that the posterior on pj would take the form P\ 



s n M{0 n ;O,a 2 n L ) + (1 - Sn )N(9 n ;0,al 



(52) 



where a n L denotes the variance of "large" coefficients and 
er 2 s denotes the variance of "small" ones. For either the BG 
or GM prior, AMP is performed using the steps (36)-(40). 
For the BG case, the functions F n (.; .), G„(.; .), F n (.\ .), and 
r„(.; .) are given in (41)-(44), whereas for the GM case, they 



3 This posterior results because the chosen prior is conjugate [16] for the 
likelihood in (45). 



4 This posterior results because the chosen prior is conjugate to the 
Bernoulli likelihood [16]. 
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take the form 

l + r„(£,c) 



(53) 



n (c \ t «(^ c )^ 2 K,l( c ) - a n,s( c )? , t -i P /t \ 
Gn{t,\c) = m i - it M2 h c ? F «(C;c) 



[l + f„(£, C )] 2 
c)a n . L (c)(l + f n (& c) - 2g 2 C„(c)) 
[i + f n (£ ;c )] 2 
, a„, s (c)(l + f n (£; c) + 2C 2 Cn(c)f„(C; c)) 



[i + f„(£ ;c )] 2 
r«(£,c) = /3„(c)exp(-C„(c)C 2 ), 
where 

9 

- / \ a a n,L 
a n ,L(c) = — ; o - 



/ \ A n,o 

a„,s(c) = 



n, S 



Cn(c) = 



1-A n / C + <L 
A„ V C + (T 2 s 

2 2 

<L ~ <S 

2 ( c + <l)( c + <s)' 



(54) 

(55) 
(56) 

(57) 
(58) 
(59) 
(60) 



Likewise, for the BG case, the extrinsic LLR is given by (35), 
whereas for the GM case, it becomes 



= ln 



1 - A? 



IV. Numerical Results 



(61) 



A. Setup 

The proposed turbo 5 approach to compressive imaging was 
compared to several other tree-sparse reconstruction algo- 
rithms: ModelCS [3], HMT+IRWL1 [4], MCMC [5], varia- 
tional Bayes (VB) [6]; and to several simple-sparse recon- 
struction algorithms: CoSaMP [7], SPGL1 [25], and Bernoulli- 
Gaussian (BG) AMP. All numerical experiments were per- 
formed on 128 x 128 (i.e., TV = 16384) grayscale images 
using a J = 4-level 2D Haar wavelet decomposition, yielding 
8 2 = 64 approximation coefficients and 3x8 2 = 192 individual 
Markov trees. In all cases, the measurement matrix <I> had 
i.i.d Gaussian entries. Unless otherwise specified, M = 5000 
noiseless measurements were used. We used normalized mean 
squared error (NMSE) \\x — £ HI/ II 31 II 2 as tne performance 
metric. 

We now describe how the hyperparameters were chosen for 
the proposed Turbo schemes. Below, we use Nj to denote 
the total number of wavelet coefficients at level j, and 7V_i 
to denote the total number of approximation coefficients. 
For both Turbo-BG and Turbo-GM, the Beta hyperparam- 
eters were chosen so that c + d = No, Q + d = N-i and 
Cj + dj = Nj Vj with E{p^} = l/N, E{pl-,} = 1 - 10~ 6 , 
E{p° } = 1/N Vj, and Ejjj} 1 } = 0.5 Vj. These informative 
hyperparameters are similar to the "universal" recommenda- 
tions in [26] and, in fact, identical to the ones suggested in 

5 An implementation of our algorithm can be downloaded from 
http://www.ece. osu.edu/~ schniter/ turboAMPimaging 



the MCMC work [5]. For Turbo-BG, the hyperparameters for 
the signal precisions {(T~ 2 } ng Wj were set to aj = 1 Vj and 
[6_i, . . . , 63] = [10,1,1,0.1,0.1]. This choice is motivated 
by the fact that wavelet coefficient magnitudes are known 
to decay exponentially with scale j (e.g., [26]). Meanwhile, 
the hyperparameters for the noise precision cr~ 2 were set 
to (a, b) = (1,10~ 6 ). Although the measurements were 
noiseless, we allow Turbo-BG a nonzero noise variance in 
order to make up for the fact that the wavelet coefficients 
are not exactly sparse, as assumed by the BG signal model. 
(We note that the same was done in the BG-based work 
[5], [6].) For Turbo-GM, the hyperparameters {a^\_,b^y) for 
the signal precisions {ff^Jnew were set at the values of 
(a,j, bj) for the BG case, while the hyperparameters (oj.s, ^j,s) 
for {c^sj-neWj were set as a^s = 1 and fo^g = IO^&^l- 
Meanwhile, the noise variance a 2 was assumed to be exactly 
zero, because the GM signal prior is capable of modeling non- 
sparse wavelet coefficients. 

For MCMC [5], the hyperparameters were set in accordance 
with the values described in [5]; the values of c,d,{cj,dj} 
are same as the ones used for the proposed Turbo-BG 
scheme, while a = cij = b = bj = 10~ 6 Vj. For VB, 
the same hyperparameters as MCMC were used except for 
a,j = 2 x 10 9 and bj = 10 10 Vj, which were the default values 
of hyperparameters used in the publicly available code. 6 We 
experimented with the values for both MCMC and VB and 
found that the default values indeed seem to work best. For 
example, if one swaps the (aj,bj) hyperparameters between 
VB and MCMC, then the average performance of VB and 
MCMC degrade by 1.69dB and 1.55dB, respectively, relative 
to the values reported in Table I. 

For both the CoSaMP and ModelCS algorithms, the prin- 
cipal tuning parameter is the assumed number of non-zero 
coefficients. For both ModelCS (which is based on CoSaMP) 
and CoSaMP itself, we used the Rice University codes, 7 which 
include a genie-aided mechanism to compute the number of 
active coefficients from the original image. However, since 
we observed that the algorithms perform somewhat poorly 
under that tuning mechanism, we instead ran (for each image) 
multiple reconstructions with the number of active coefficients 
varying from 200 to 2000 in steps of 100, and reported the 
result with the best NMSE. The number of active coefficients 
chosen in this manner was usually much smaller than that 
chosen by the genie-aided mechanism. 

To implement BG-AMP, we used the AMP scheme de- 
scribed in Section III-C with the hyperparameter learning 
scheme described in Section III-D; HMT structure was not 
exploited. For this, we assumed that the priors on variance 
<T 2 and activity A n were identical over the coefficient index 
71, and assigned Gamma and Beta hyperpriors of (a, b) = 
(10- 10 ,10- 10 ) and (c,d) = (0.17V, 0.97V), respectively. 

For HMT+IRWL1, we ran code provided by the authors 
with default settings. For SPGL1, 8 the residual variance was 
set to 0, and all parameters were set at their defaults. 



http://people.ee.duke.edU/~ lcmin/BCS.html 

7 http://dsp.rice.edu/software/model-based-compressive-sensing-toolboK 
8 nttp:// www.es. ubc.ca/labs/ scl/ spgll/ index.html 
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Original 



HMT+IRWL1 



ModelCS 



BG-AMP 




■ nn m.\ bia * 

Fig. 4. Reconstruction from M = 5000 observations of a 128 X 128 (i.e., N = 16384) section of the cameraman image using i.i.d Gaussian 




Fig. 5. A sample image from each of the 20 types in the Microsoft database. 
Image statistics were found to vary significantly from one type to another. 



Algorithm 


NMSE (dB) 


Computation Time (sec) 


HMT+IRWLl 


-14.37 


363 


CoSaMP 


-16.90 


25 


ModelCS 


-17.45 


117 


BG-AMP 


-17.84 


68 


SPGLl 


-18.06 


536 


VB 


-19.04 


107 


MCMC 


-20.10 


742 


Turbo-BG 


-20.49 


51 


Turbo-GM 


-20.74 


51 



TABLE I 

NMSE AND RUNTIME AVERAGED OVER 591 IMAGES. 



B. Results 

Fig. 4 shows a 128x128 section of the "cameraman" image 
along with the images recovered by the various algorithms. 
Qualitatively, we see that CoSaMP, which leverages only 
simple sparsity, and ModelCS, which models persistence- 
across-scales (PAS) through a deterministic tree structure, 
both perform relatively poorly. HMT+IRWLl also performs 
relatively poorly, due to (we believe) the ad-hoc manner in 
which the HMT structure was exploited via iteratively re- 
weighted £i. The BG-AMP and SPGLl algorithms, neither 
of which attempt to exploit PAS, perform better. The HMT- 
based schemes (VB, MCMC, Turbo-GM, and Turbo-GM) all 
perform significantly better, with the Turbo schemes perform- 
ing best. 

For a quantitative comparison, we measured average perfor- 
mance over a suite of images in a Microsoft Research Object 



CD 




10 12 

Image type 



Fig. 6. Average NMSE for each image type. 



Class Recognition database 9 that contains 20 types of images 
(see Fig. 5) with roughly 30 images of each type. In particular, 
we computed the average NMSE and average runtime on a 
2.5 GHz PC, for each image type. These results are reported 
in Figures 6 and 7, and the global averages (over all 591 
images) are reported in Table I. From the table, we observe 
that the proposed Turbo algorithms outperform all the other 
tested algorithms in terms of reconstruction NMSE, but are 
beaten only by CoSaMP in speed. 10 Between the two Turbo 
algorithms, we observe that Turbo-GM slightly outperforms 
Turbo-BG in terms of reconstruction NMSE, while taking the 

' We used 128 X 128 images extracted from 
the "Pixel-wise labelled image database v2" at 
http://research.microsoft. com/en- us/ 'projects/ 'objectclasswcognition . What 
we refer to as an "image type" is a "row" in this database. 

10 The CoSaMP runtimes must be interpreted with caution, because the 
reported runtimes correspond to a single reconstruction, whereas in practice 
multiple reconstructions may be needed to determined the best value of the 
tuning parameter. 



Image type 

Fig. 7. Average runtime for each image type. 
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Number of measurements (M) 

Fig. 8. Average NMSE for images of type 1. 



same runtime. In terms of NMSE performance, the closest 
competitor to the Turbo schemes is MCMC, 11 whose NMSE 
is 0.39dB worse than Turbo-BG and 0.65dB worse than Turbo- 
GM. The good NMSE performance of MCMC comes at the 
cost of complexity, though: MCMC is 15 times slower than 
the Turbo schemes. The second closest NMSE-competitor 
is VB, showing performance 1.5 dB worse than Turbo-BG 
and 1.7dB worse than Turbo-GM. Even with this sacrifice 
in performance, VB is still twice as slow as the Turbo 
schemes. Among the algorithms that do not exploit PAS, 
we see that SPGL1 offers the best NMSE performance, but 
is by far the slowest (e.g., 20 times slower than CoSaMP). 
Meanwhile, CoSaMP is the fastest, but shows the worst NMSE 
performance (e.g., 1.16dB worse than SPGL1). BG-AMP 
strikes an excellent balance between the two: its NMSE is 
only 0.22dB away from SPGL1, whereas it takes only 2.7 
times as long as CoSaMP. However, by combining the AMP 
algorithm with HMT structure via the turbo approach, it is 

11 The MCMC results reported here are for the defaults settings: 100 
MCMC iterations and 200 burn-in iterations. Using 500 MCMC iterations 
and 200 burn-in iterations, we obtained an average NMSE of — 20.22dB (i.e., 
0.12dB better) at an average runtime of 1958 sec (i.e., 2.6x slower). 



Number of measurements (M) 
Fig. 9. Average runtime for images of type 1. 



possible to significantly improve NMSE while simultaneously 
decreasing the runtime. The reason for the complexity decrease 
is twofold. First, the HMT structure helps the AMP and 
parameter-learning iterations to converge faster. Second, the 
HMT steps are computationally negligible relative to the AMP 
steps: when, e.g., M = 5000, the AMP portion of the turbo 
iteration takes approximately 6 sec while the HMT portion 
takes 0.02 sec. 

We also studied NMSE and compute time as a function of 
the number of measurements, M. For this study, we examined 
images of Type 1 at M = 2500, 5000, 7500, 10000, 12500. In 
Figure 8, we see that Turbo-GM offers the uniformly best 
NMSE performance across M. However, as M decreases, 
there is little difference between the NMSEs of Turbo-GM, 
Turbo-CS, and MCMC. As M increases, though, we see that 
the NMSEs of MCMC and VB converge, but that they are 
significantly outperformed by Turbo-GM, Turbo-CS, and — 
somewhat surprisingly — SPGL1. In fact, at M = 12500, 
SPGL1 outperforms Turbo-BG, but not Turbo-GM. However, 
the excellent performance of SPGL1 at these M comes at the 
cost of very high complexity, as evident in Figure 9. 

V. Conclusion 

We proposed a new approach to HMT-based compressive 
imaging based on loopy belief propagation, leveraging a turbo 
message passing schedule and the AMP algorithm of Donoho, 
Maleki, and Montanari. We then tested our algorithm on a suite 
of 591 natural images and found that it outperformed the state- 
of-the-art approach (i.e., variational Bayes) while halving its 
runtime. 
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